Advancing Linguistic Features and Insights by Label-informed Feature Grouping: An Exploration in the Context of Native Language Identification

نویسندگان

  • Serhiy Bykh
  • Walt Detmar Meurers
چکیده

We propose a hierarchical clustering approach designed to group linguistic features for supervised machine learning that is inspired by variationist linguistics. The method makes it possible to abstract away from the individual feature occurrences by grouping features together that behave alike with respect to the target class, thus providing a new, more general perspective on the data. On the one hand, it reduces data sparsity, leading to quantitative performance gains. On the other, it supports the formation and evaluation of hypotheses about individual choices of linguistic structures. We explore the method using features based on verb subcategorization information and evaluate the approach in the context of the Native Language Identification (NLI) task.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Looking at Globalization of English in the Context of Internationalism

The present study is an attempt to provide a current synopsis of World Englishes within globalized communities, as well as theoretical and applied feasibility of global linguistic features of English as an International Language (EIL). To do so, first, three main reactions against the spread of English by scholars around the world are discussed. Then, the possibility of describing and teaching ...

متن کامل

Code-Copying in the Balochi Language of Sistan

This empirical study deals with language contact phenomena in Sistan. Code-copying is viewed as a strategy of linguistic behavior when a dominated language acquires new elements in lexicon, phonology, morphology, syntax, pragmatic organization, etc., which can be interpreted as copies of a dominating language. In this framework Persian is regarded as the model code which provides elements for b...

متن کامل

Constitutive Features of the Russian Political Discourse in Ecolinguistic Aspect

The article offers a comparative description of typological mechanisms used in political communicative practice and methods of verbal explication of its axiological and symbolic constituents determining universal mental features of individual/collective consciousness. The research position based on a systemic multilevel analysis of the component structure of discourse facilitates the identifica...

متن کامل

Gender-preferential Linguistic Elements in Applied Linguistics Research Papers: Partial Evaluation of a Model of Gendered Language

This article intended to investigate whether the gender-preferential linguistic elements found by Argomon, Koppel, Fine and Shimoni (2003) show the same gender-linked frequencies in applied linguistics research papers written by non-native speakers of English. In so doing, a sample of 32 articles from different journals was collected and the proportion of the targeted features to the whole numb...

متن کامل

The Discursive Construction of “Native” and “Non-Native” ‎Speaker English Teacher Identities in Japan: A Linguistic ‎Ethnographic Investigation

Recent poststructuralist theories of identity posit identities as being discursively constructed in interactions with society, institutions, and individuals. This study used a Linguistic Ethnographic framework to investigate the discursive identity construction of two English teachers, one ‘non-native’ English speaker, and one ‘native’ English speaker, teaching English in a tertiary institution...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016